A scalable, parallel algorithm for maximal clique enumeration

نویسندگان

Matthew C. Schmidt

Nagiza F. Samatova

Kevin Thomas

Byung-Hoon Park

چکیده

The problem of maximal clique enumeration (MCE) is to enumerate all of the maximal cliques in a graph. Once enumerated, maximal cliques are widely used to solve problems in areas such as 3-D protein structure alignment, genome mapping, gene expression analysis, and detection of social hierarchies. Even the most efficient serial MCE algorithms require large amounts of time to enumerate the maximal cliques in networks arising from these problems that contain hundreds, thousands, or larger numbers of vertices. The previous attempts to provide practical solutions to the MCE problem through parallel implementation have had limited success, largely due to a number of challenges inherent to the nature of the MCE combinatorial search space. On the one hand, MCE algorithms often create a backtracking search tree that has a highly irregular and hard-or-impossible to predict structure; therefore, almost any static decomposition of the search tree by parallel processors results in highly unbalanced processor execution times. On the other hand, the data-intensive nature of the MCE problem often makes naive dynamic load distribution strategies that require extensive data movement prohibitively expensive. As a result, good scaling of the overall execution time of parallel MCE algorithms has been reported for only up to a couple hundred processors. In this paper, we propose a parallel, scalable, and memory-efficient MCE algorithm for distributed and/or shared memory high performance computing architectures, whose runtime scales linearly for thousands of processors on real-world application graphs with hundreds and thousands of nodes. Its scalability and efficiency are attributed to the proposed: (a) representation of the search tree decomposition to enable parallelization; (b) parallel depth-first backtracking search to both constrain the search space and minimize memory requirement; (c) least stringent synchronization to minimize data movement; and (d) on-demandwork stealing intelligently coupled with work stack splitting to minimize computing elements’ idle time. To the best of our knowledge, the proposed parallel MCE algorithm is the first to achieve a linear scaling runtime using up to 2048 processors on Cray XT machines for a number of real-world biological networks. Published by Elsevier Inc.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining maximal cliques from a large graph using MapReduce: Tackling highly uneven subproblem sizes

We consider Maximal Clique Enumeration (MCE) from a large graph. A maximal clique is perhaps the most fundamental dense substructure in a graph, and MCE is an important tool to discover densely connected subgraphs, with numerous applications to data mining on web graphs, social networks, and biological networks. While effective sequential methods for MCE are known, scalable parallel methods for...

متن کامل

Coupling graph perturbation theory with scalable parallel algorithms for large-scale enumeration of maximal cliques in biological graphs

Data-driven construction of predictive models for biological systems faces challenges from data intensity, uncertainty, and computational complexity. Data-driven model inference is often considered a combinatorial graph problem where an enumeration of all feasible models is sought. The data-intensive and the NP -hard nature of such problems, however, challenges existing methods to meet the requ...

متن کامل

Scalable Graph - Mining Techniques with Applications to Systems Biology

SCHMIDT, MATTHEW C. Scalable Graph-Mining Techniques with Applications to Systems Biology. (Under the direction of Nagiza F. Samatova.) Genetic engineers often seek to modify the genome of prokaryotic organisms in order to improve their efficiency in certain industrial processes. This requires an understanding of the biological systems that are responsible for the expression of the organism’s p...

متن کامل

On the Relative Efficiency of Maximal Clique Enumeration Algorithms, with Application to High-Throughput Computational Biology

The efficient enumeration of maximal cliques has applications in microarray analysis and a number of other foundational problems of computational biology. In this paper, we analyze and test existing maximal clique enumeration algorithms for various classes of graphs. The classic branch and bound algorithm of Bron and Kerbosch proves to be relatively fast for sparse graphs, but slows considerabl...

متن کامل

An Important Corollary for the Fast Solution of Dynamic Maximal Clique Enumeration Problems

In this paper we modify an algorithm for updating a maximal clique enumeration after an edge insertion to provide an algorithm that runs in linear time with respect to the number of cliques containing one of the edge’s endpoints, whereas existing algorithms take quadratic time.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

J. Parallel Distrib. Comput.

دوره 69 شماره

صفحات -

تاریخ انتشار 2009

A scalable, parallel algorithm for maximal clique enumeration

نویسندگان

چکیده

منابع مشابه

Mining maximal cliques from a large graph using MapReduce: Tackling highly uneven subproblem sizes

Coupling graph perturbation theory with scalable parallel algorithms for large-scale enumeration of maximal cliques in biological graphs

Scalable Graph - Mining Techniques with Applications to Systems Biology

On the Relative Efficiency of Maximal Clique Enumeration Algorithms, with Application to High-Throughput Computational Biology

An Important Corollary for the Fast Solution of Dynamic Maximal Clique Enumeration Problems

عنوان ژورنال:

اشتراک گذاری